{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "EZBwDI7xiVu5"
   },
   "source": [
    "# 1- Class Activation Map with convolutions\n",
    "\n",
    "In this first part, we will code class activation map as described in the paper [Learning Deep Features for Discriminative Localization](http://cnnlocalization.csail.mit.edu/)\n",
    "\n",
    "There is a GitHub repo associated with the paper:\n",
    "https://github.com/zhoubolei/CAM\n",
    "\n",
    "And even a demo in PyTorch:\n",
    "https://github.com/zhoubolei/CAM/blob/master/pytorch_CAM.py\n",
    "\n",
    "The code below is adapted from this demo but we will not use hooks only convolutions..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "DNUU0eixiVvB"
   },
   "outputs": [],
   "source": [
    "import io\n",
    "import requests\n",
    "from PIL import Image\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "from torchvision import models, transforms\n",
    "from torch.nn import functional as F\n",
    "import torch.optim as optim\n",
    "import numpy as np\n",
    "import cv2\n",
    "import pdb\n",
    "from matplotlib.pyplot import imshow\n",
    "\n",
    "\n",
    "# input image\n",
    "LABELS_URL = 'https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json'\n",
    "IMG_URL = 'http://media.mlive.com/news_impact/photo/9933031-large.jpg'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As in the demo, we will use the Resnet18 architecture. In order to get CAM, we need to transform this network in a fully convolutional network: at all layers, we need to deal with images, i.e. with a shape $\\text{Number of channels} \\times W\\times H$ . In particular, we are interested in the last images as shown here:\n",
    "![](https://camo.githubusercontent.com/fb9a2d0813e5d530f49fa074c378cf83959346f7/687474703a2f2f636e6e6c6f63616c697a6174696f6e2e637361696c2e6d69742e6564752f6672616d65776f726b2e6a7067)\n",
    "\n",
    "As we deal with a Resnet18 architecture, the image obtained before applying the `AdaptiveAvgPool2d` has size $512\\times 7 \\times 7$ if the input has size $3\\times 224\\times 224 $:\n",
    "![resnet_Archi](https://pytorch.org/assets/images/resnet.png)\n",
    "\n",
    "A- The first thing, you will need to do is 'removing' the last layers of the resnet18 model which are called `(avgpool)` and `(fc)`. Check that for an original image of size $3\\times 224\\times 224 $, you obtain an image of size $512\\times 7\\times 7$.\n",
    "\n",
    "B- Then you need to retrieve the weights (and bias) of the `fc` layer, i.e. a matrix of size $1000\\times 512$ transforming a vector of size 512 into a vector of size 1000 to make the prediction. Then you need to use these weights and bias to apply it pixelwise in order to transform your $512\\times 7\\times 7$ image into a $1000\\times 7\\times 7$ output (Hint: use a convolution). You can interpret this output as follows: `output[i,j,k]` is the logit for 'pixel' `[j,k]` for being of class `i`.\n",
    "\n",
    "C- From this $1000\\times 7\\times 7$ output, check that you can retrieve the original output given by the `resnet18` by using an `AdaptiveAvgPool2d`. Can you understand why this is true?\n",
    "\n",
    "D- In addition, you can construct the Class Activation Map. Draw the activation map for the class mountain bike, for the class lakeside."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Validation:\n",
    "1. make sure that when running your notebook, you display both CAM for the class mountain bike and for the class lakeside.\n",
    "2. for question B above, what convolution did you use? Your answer, i.e. the name of the Pytorch layer with the correct parameters (in_channel,kernel...) here:\n",
    "\n",
    "<span style=\"color:red\">Replace by your answer</span>\n",
    "3. your short explanation of why your network gives the same prediction as the original `resnet18`:\n",
    "\n",
    "<span style=\"color:red\">Replace by your answer</span>\n",
    "4. Is your network working on an image which is not of size $224\\times 224$, i.e. without resizing? and what about `resnet18`? Explain why?\n",
    "\n",
    "<span style=\"color:red\">Replace by your answer</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "cMgoA17fiVvT"
   },
   "outputs": [],
   "source": [
    "net = models.resnet18(pretrained=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "net.eval()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = torch.randn(5, 3, 224, 224)\n",
    "y = net(x)\n",
    "y.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "quQMxt1-iVxL"
   },
   "outputs": [],
   "source": [
    "n_mean = [0.485, 0.456, 0.406]\n",
    "n_std = [0.229, 0.224, 0.225]\n",
    "\n",
    "normalize = transforms.Normalize(\n",
    "   mean=n_mean,\n",
    "   std=n_std\n",
    ")\n",
    "preprocess = transforms.Compose([\n",
    "   transforms.Resize((224,224)),\n",
    "   transforms.ToTensor(),\n",
    "   normalize\n",
    "])\n",
    "\n",
    "# Display the image we will use.\n",
    "response = requests.get(IMG_URL)\n",
    "img_pil = Image.open(io.BytesIO(response.content))\n",
    "imshow(img_pil);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_tensor = preprocess(img_pil)\n",
    "net = net.eval()\n",
    "logit = net(img_tensor.unsqueeze(0))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "logit.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_tensor.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# download the imagenet category list\n",
    "classes = {int(key):value for (key, value)\n",
    "          in requests.get(LABELS_URL).json().items()}\n",
    "\n",
    "\n",
    "def print_preds(logit):\n",
    "    # print the predictions with their 'probabilities' from the logit\n",
    "    h_x = F.softmax(logit, dim=1).data.squeeze()\n",
    "    probs, idx = h_x.sort(0, True)\n",
    "    probs = probs.numpy()\n",
    "    idx = idx.numpy()\n",
    "    # output the prediction\n",
    "    for i in range(0, 5):\n",
    "        print('{:.3f} -> {}'.format(probs[i], classes[idx[i]]))\n",
    "    return idx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "idx = print_preds(logit)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def returnCAM(feature_conv, idx):\n",
    "    # input: tensor feature_conv of dim 1000*W*H and idx between 0 and 999\n",
    "    # output: image W*H with entries rescaled between 0 and 255 for the display\n",
    "    cam = feature_conv[idx].detach().numpy()\n",
    "    cam = cam - np.min(cam)\n",
    "    cam_img = cam / np.max(cam)\n",
    "    cam_img = np.uint8(255 * cam_img)\n",
    "    return cam_img"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#some utilities\n",
    "def pil_2_np(img_pil):\n",
    "    # transform a PIL image in a numpy array\n",
    "    return np.asarray(img_pil)\n",
    "\n",
    "def display_np(img_np):\n",
    "    imshow(Image.fromarray(np.uint8(img_np)))\n",
    "    \n",
    "def plot_CAM(img_np, CAM):\n",
    "    height, width, _ = img_np.shape\n",
    "    heatmap = cv2.applyColorMap(cv2.resize(CAM,(width, height)), cv2.COLORMAP_JET)\n",
    "    result = heatmap * 0.3 + img_np * 0.5\n",
    "    display_np(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# here is a fake example to see how things work\n",
    "img_np = pil_2_np(img_pil)\n",
    "diag_CAM = returnCAM(torch.eye(7).unsqueeze(0),0)\n",
    "plot_CAM(img_np,diag_CAM)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code here for your new network\n",
    "net_conv = \n",
    "# do not forget:\n",
    "net_conv = net_conv.eval()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# to test things are right\n",
    "x = torch.randn(5, 3, 224, 224)\n",
    "y = net_conv(x)\n",
    "y.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "logit_conv = net_conv(img_tensor.unsqueeze(0))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "logit_conv.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# transfor this to a [1,1000] tensor with AdaptiveAvgPool2d\n",
    "logit_new = "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "idx = print_preds(logit_new)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "i = #index of lakeside\n",
    "CAM1 = returnCAM(logit_conv.squeeze(),idx[i])\n",
    "plot_CAM(img_np,CAM1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "i = #index of mountain bike\n",
    "CAM2 = returnCAM(logit_conv.squeeze(),idx[i])\n",
    "plot_CAM(img_np,CAM2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2- Adversarial examples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this second part, we will look at [adversarial examples](https://arxiv.org/abs/1607.02533): \"An adversarial example is a sample of input data which has been modified very slightly in a way that is intended to cause a machine learning classifier to misclassify it. In many cases, these modifications can be so subtle that a human observer does not even notice the modification at all, yet the classifier still makes a mistake. Adversarial examples pose security concerns because they could be used to perform an attack on machine learning systems...\"\n",
    "\n",
    "Rules of the game:\n",
    "- the attacker cannot modify the classifier, i.e. the neural net with the preprocessing done on the image before being fed to the network. \n",
    "- even if the attacker cannot modify the classifier, we assume that the attacker knows the architecture of the classifier. Here, we will still work with `resnet18` and the standard Imagenet normalization. \n",
    "- the attacker can only modify the physical image fed into the network.\n",
    "- the attacker should fool the classifier, i.e. the label obtained on the corrupted image should not be the same as the label predicted on the original image.\n",
    "\n",
    "First, you will implement *Fast gradient sign method (FGSM)* which is described in Section 2.1 of [Adversarial examples in the physical world](https://arxiv.org/abs/1607.02533). The idea is simple, suppose you have an image $\\mathbf{x}$ and when you pass it through the network, you get the 'true' label $y$. You know that your network has been trained by minimizing the loss $J(\\mathbf{\\theta}, \\mathbf{x}, y)$ with respect to the parameters of the network $\\theta$. Now, $\\theta$ is fixed as you cannot modify the classifier so you need to modify $\\mathbf{x}$. In order to do so, you can compute the gradient of the loss with respect to $\\mathbf{x}$ i.e. $\\nabla_{\\mathbf{x}} J(\\mathbf{\\theta}, \\mathbf{x}, y)$ and use it as follows to get the modified image $\\tilde{\\mathbf{x}}$:\n",
    "$$\n",
    "\\tilde{\\mathbf{x}} = \\text{Clamp}\\left(\\mathbf{x} + \\epsilon *\n",
    "\\text{sign}(\\nabla_{\\mathbf{x}} J(\\mathbf{\\theta}, \\mathbf{x}, y)),0,1\\right),\n",
    "$$\n",
    "where $\\text{Clamp}(\\cdot, 0,1)$ ensures that $\\tilde{\\mathbf{x}}$ is a proper image.\n",
    "Note that if instead of sign, you take the full gradient, you are now following the gradient i.e. increasing the loss $J(\\mathbf{\\theta}, \\mathbf{x}, y)$ so that $y$ becomes less likely to be the predicted label."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Validation:\n",
    "1. Implement this attack. Make sure to display the corrupted image.\n",
    "\n",
    "2. For what value of epsilon is your attack successful? What is the predicted class then?\n",
    "\n",
    "<span style=\"color:red\">Replace by your answer</span>\n",
    "\n",
    "3. plot the sign of the gradient and pass this image through the network. What prediction do you obtain? Compare to [Explaining and Harnessing Adversarial Examples](https://arxiv.org/abs/1412.6572) \n",
    "\n",
    "<span style=\"color:red\">Replace by your answer</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Image under attack!\n",
    "url_car = 'https://cdn130.picsart.com/263132982003202.jpg?type=webp&to=min&r=640'\n",
    "response = requests.get(url_car)\n",
    "img_pil = Image.open(io.BytesIO(response.content))\n",
    "imshow(img_pil);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# same as above\n",
    "preprocess = transforms.Compose([\n",
    "   transforms.Resize((224,224)),\n",
    "   transforms.ToTensor(),\n",
    "   normalize\n",
    "])\n",
    "\n",
    "for p in net.parameters():\n",
    "    p.requires_grad = False\n",
    "    \n",
    "x = preprocess(img_pil).clone().unsqueeze(0)\n",
    "logit = net(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "_ = print_preds(logit)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t_std = torch.from_numpy(np.array(n_std, dtype=np.float32)).view(-1, 1, 1)\n",
    "t_mean = torch.from_numpy(np.array(n_mean, dtype=np.float32)).view(-1, 1, 1)\n",
    "\n",
    "def plot_img_tensor(img):\n",
    "    imshow(np.transpose(img.detach().numpy(), [1,2,0]))\n",
    "\n",
    "def plot_untransform(x_t): \n",
    "    x_np = (x_t * t_std + t_mean).detach().numpy()\n",
    "    x_np = np.transpose(x_np, [1, 2, 0])\n",
    "    imshow(x_np)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# here we display an image given as a tensor\n",
    "x_img = (x * t_std + t_mean).squeeze(0)\n",
    "plot_img_tensor(x_img)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# your implementation of the attack\n",
    "def fgsm_attack(image, epsilon, data_grad):\n",
    "    # Collect the element-wise sign of the data gradient\n",
    "    \n",
    "    # Create the perturbed image by adjusting each pixel of the input image\n",
    "    \n",
    "    # Adding clipping to maintain [0,1] range\n",
    "    \n",
    "    # Return the perturbed image\n",
    "    return perturbed_image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "idx = 656 #minivan\n",
    "criterion = nn.CrossEntropyLoss()\n",
    "x_img.requires_grad = True\n",
    "logit = net(normalize(x_img).unsqueeze(0))\n",
    "target = torch.tensor([idx])\n",
    "\n",
    "    #TODO: compute the loss to backpropagate\n",
    "\n",
    "_ = print_preds(logit)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# your attack here\n",
    "epsilon = 0\n",
    "x_att = fgsm_attack(x_img,epsilon,?)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# the new prediction for the corrupted image\n",
    "logit = net(normalize(x_att).unsqueeze(0))\n",
    "_ = print_preds(logit)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# can you see the difference?\n",
    "plot_img_tensor(x_att)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# do not forget to plot the sign of the gradient\n",
    "gradient = \n",
    "plot_img_tensor((1+gradient)/2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# what is the prediction for the gradient? \n",
    "logit = net(normalize(gradient).unsqueeze(0))\n",
    "_ = print_preds(logit)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3- Transforming a car into a cat\n",
    "\n",
    "We now implement the *Iterative Target Class Method (ITCM)* as defined by equation (4) in [Adversarial Attacks and Defences Competition](https://arxiv.org/abs/1804.00097)\n",
    "\n",
    "To test it, we will transform the car (labeled minivan by our `resnet18`) into a [Tabby cat](https://en.wikipedia.org/wiki/Tabby_cat) (classe 281 in Imagenet). But you can try with any other target."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Validation:\n",
    "1. Implement the ITCM and make sure to display the resulting image. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = preprocess(img_pil).clone()\n",
    "xd = preprocess(img_pil).clone()\n",
    "xd.requires_grad = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "idx = 281 #tabby\n",
    "optimizer = optim.SGD([xd], lr=0.01)\n",
    "\n",
    "for i in range(200):\n",
    "    #TODO: your code here\n",
    "    \n",
    "    optimizer.zero_grad()\n",
    "    loss.backward()\n",
    "    optimizer.step()\n",
    "    print(loss.item())\n",
    "    \n",
    "    _ = print_preds(output)\n",
    "    print(i,'-----------------')\n",
    "    \n",
    "    # TODO: break the loop once we are satisfied \n",
    "    if ?:\n",
    "        break"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "_ = print_preds(output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot the corrupted image\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4- Where is the cat hidden?\n",
    "\n",
    "Last, we use CAM to understand where the network see a cat in the image."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Validation:\n",
    "1. display the CAM for the class tabby\n",
    "\n",
    "2. display the CAM for the class minivan\n",
    "\n",
    "3. where is the cat?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "kernelspec": {
   "display_name": "dldiy",
   "language": "python",
   "name": "dldiy"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
